Improving Chinese Storing Text Retrieval Systems' Security via a Novel Maximal Prefix Coding

نویسندگان

  • Dongyang Long
  • Weijia Jia
  • Pui-on Au
  • Ming Li
چکیده

As we have seen that Huffman coding has been widely used in data, image, and video compression. In this paper novel maximal prefix coding is introduced. Relationship between the Huffman coding and the optimal maximal prefix coding are discussed. We show that all Huffman coding schemes are optimal maximal prefix coding schemes and that conversely the optimal maximal prefix coding schemes need not to be the Huffman coding schemes. Moreover, it is proven that, for any maximal prefix code C, there exists an information source I = (∑, P) such that C is exactly a Huffman code for I. Therefore, it is essential to show that the class of Huffman codes is coincident with one of maximal prefix codes. A case study of data compression is also given. Comparing the Huffman coding, the maximal prefix coding is used for not only statistical modeling but also dictionary methods. And it is good at applying to a large information retrieval system and improving its security.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improving Information Retrieval System Security via an Optimal Maximal Coding Scheme

Novel maximal coding compression techniques for the most important file-the text file of any full-text retrieval system are discussed in this paper. As a continuation of our previous work, we show that the optimal maximal coding schemes coincide with the optimal uniquely decodable coding schemes. An efficient algorithm generating an optimal maximal code (or an optimal uniquely decodable code) i...

متن کامل

Improving Semistatic Compression Via Pair-Based Coding

In the last years, new semistatic word-based byte-oriented compressors, such as Plain and Tagged Huffman and the Dense Codes, have been used to improve the efficiency of text retrieval systems, while reducing the compressed collections to 30–35% of their original size. In this paper, we present a new semistatic compressor, called Pair-Based End-Tagged Dense Code (PETDC). PETDC compresses Englis...

متن کامل

Optimal Maximal Prefix Coding and Huffman Coding

Huffman coding has been widely used in data, image, and video compression. Novel maximal prefix coding different from the Huffman coding is introduced. Relationships between the Huffman coding and optimal maximal prefix coding are discussed. We show that all Huffman coding schemes are maximal prefix coding schemes and have the shortest average code word length among maximal prefix coding scheme...

متن کامل

Application of the Tightness Continuum Measure to Chinese Information Retrieval

Most word segmentation methods employed in Chinese Information Retrieval systems are based on a static dictionary or a model trained against a manually segmented corpus. These general segmentation approaches may not be optimal because they disregard information within semantic units. We propose a novel method for improving word-based Chinese IR, which performs segmentation according to the tigh...

متن کامل

Private Key based query on encrypted data

Nowadays, users of information systems have inclination to use a central server to decrease data transferring and maintenance costs. Since such a system is not so trustworthy, users' data usually upkeeps encrypted. However, encryption is not a nostrum for security problems and cannot guarantee the data security. In other words, there are some techniques that can endanger security of encrypted d...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Int. J. Comput. Proc. Oriental Lang.

دوره 15  شماره 

صفحات  -

تاریخ انتشار 2002